Weaving sloWNet using window-based co-occurence features
نویسندگان
چکیده
This paper presents the first results of using statistical methods and linguistically annotated corpus data to extract lists of semantically similar words that are then incorporated into an existing wordnet for Slovene. The approach was originally developed for Polish but is attractive for other languages as well because, apart from a large corpus, it requires minimal NLP tools and resources, and can therefore be easily applied to any language that is still lacking an extensive wordnet or a similar semantic lexicon. Another important advantage of the adopted approach is that it relies on real linguistic evidence harvested from a corpus, yielding a linguistically sound organization of the vocabulary. As all the previous approaches used for the construction of Slovene wordnet were transfer-based and relied on the English Princeton WordNet, the encouraging results obtained in the presented experiment will be a welcome complement to the existing semantic network. Spletanje sloWNeta na podlagi informacij o sopojavljanju besed v korpusu V prispevku predstavljamo prve rezultate raziskave, v kateri smo z uporabo statističnih metod in jezikoslovno označenih korpusnih podatkov izluščili sezname semantično podobnih besed, ki smo jih nato vključili v wordnet za slovenščino. Pristop je bil prvotno razvit za poljščino, vendar je privlačen tudi za druge jezike, saj zanj razen obsežnega korpusa potrebujemo minimalna jezikovnotehnološka orodja in vire, zato ga je enostavno uporabiti tudi za jezike, za katere obsežen wordnet ali podoben semantični leksikon še ne obstaja. Druga pomembna prednost uporabljenega pristopa pa je, da temelji na izpričani jezikovni rabi, pridobljeni iz korpusa, ki se nato kaže v jezikovno utemeljeni organizaciji besedišča v izdelani semantični mreži. Glede na to, da so vsi naši dosedanji pristopi za izdelovo slovenskega wordneta celotno strukturo prevzeli iz Princeton WordNeta, ki je bil izdelan za angleščino, bodo spodbudni rezultati, dobljeni s pričujočo metodo, koristno dopolnjevali obstoječo semantično mrežo.
منابع مشابه
Modeling Patterns of Activity and Detecting Abnormal Events with Low-level Co-occurrences
We explore in this chapter a location-based approach for behavior modeling and abnormality detection. In contrast to conventional objectbased approaches for which objects are identified, classified, and tracked to locate objects with suspicious behavior, we proceed directly with event characterization and behavior modeling using low-level features. Our approach consists of two-phases. In the fi...
متن کاملDefect Classification in Fabric Web Material using LabVIEW
Textile manufacturing is a major industry in India. It is based on the conversion of three types of fibre into yarn which in turn is woven into fabrics. Fabrics are textile materials which are made through weaving ,knitting, braiding and bonding of fibres. Weaving is described as inter-lacing of two distinct set of threads to form cloth, rug or other type of woven textile. The lengthways thread...
متن کاملCommentary on “Co-Occurrence of Pituitary Adenoma With Suprasellar and Olfactory Groove Meningiomas”
Recently, Basic and Clinical Neuroscience published an article by Lim et al. (2016) entitled Co-occurence of Pituitary Adenoma with Suprasellar and Olfactory Groove Meningiomas. They claimed it as the first case of co-occurence of these two malignancies. However, to our knowledge, this is not the first case reported in this regard. We reported the same case scenario in a 61-year-old woman...
متن کاملOn 3-Harness Weaving: Cataloging Designs Generated by Fundamental Blocks Having Distinct Rows and Columns
A weaving drawdown is a rectangular grid of black and white squares with at least one black and one white square in each row and column. A pattern results from vertical and horizontal translations of the defining grid. Any such grid defines a tiling pattern. However, from a weaving point of view, some of these grids define actual fabrics while others correspond to collections of threads that fa...
متن کاملIris Texture Recognition Using Co-occurence Matrix Features with
Iris Recognition is a rapidly expanding method of biometric authentication that is well suited to be applied to any access control system requiring high level of security. In this paper k-means algorithm is employed to optimize the database enrollment, this is carried out by choosing the best image (among many) for the same person to be a template in the database. Iris images are mapped into te...
متن کامل